Automated Test: kafka-consumer-parallel-after #319

admin-coderabbit · 2026-02-04T18:50:35Z

This pull request was automatically created by @coderabbitai/e2e-reviewer.

Batch created pull request.

Summary by CodeRabbit

Release Notes

New Features
- Added "thread-queue-parallel" processing mode for result and uptime consumers with guaranteed per-group ordering and improved offset tracking and commit coordination.
Refactor
- Updated configuration options and help text for worker settings to better describe parallelism capacity.
Tests
- Added comprehensive test suites for queue-based processing, covering offset tracking, commit semantics, order preservation, concurrent processing, and Kafka integration validation.

One potential problem we have with batch processing is that any one slow item will clog up the whole batch. This pr implements a queueing method instead, where we keep N queues that each have their own workers. There's still a chance of individual items backlogging a queue, but we can try increased concurrency here to reduce the chances of that happening

coderabbit-eval · 2026-02-04T18:51:30Z

📝 Walkthrough

Walkthrough

The PR introduces a thread-queue-parallel processing mode for Kafka result consumption. It adds a modular in-process framework with per-partition offset tracking, multiple worker queues with per-group ordering guarantees, and integrates this into the results consumer factory. Comprehensive tests validate queue mechanics, offset management, ordering semantics, and end-to-end integration.

Changes

Cohort / File(s)	Summary
Configuration Updates `src/sentry/consumers/__init__.py`	Added "thread-queue-parallel" as a new uptime mode choice; updated max-workers help text to describe overall parallelism capacity in parallel mode.
Core Queue Processing Framework `src/sentry/remote_subscriptions/consumers/queue_consumer.py`	Introduced complete in-process, in-order, multi-queue processing framework with: WorkItem dataclass for partition/offset tracking, OffsetTracker for thread-safe per-partition offset state management, OrderedQueueWorker for dedicated queue processing, FixedQueuePool for multi-queue distribution via consistent hashing, and SimpleQueueProcessingStrategy to wire components together with periodic commit loop and metrics reporting.
Results Consumer Integration `src/sentry/remote_subscriptions/consumers/result_consumer.py`	Extended ResultsStrategyFactory to support "thread-queue-parallel" mode with new thread_queue_parallel flag, queue_pool attribute, create_thread_queue_parallel_worker() method, and conditional routing in create_with_partitions(); added queue pool shutdown in factory shutdown().
Queue Consumer Tests `tests/sentry/remote_subscriptions/consumers/test_queue_consumer.py`	Comprehensive test suite covering OffsetTracker semantics, FixedQueuePool queue assignment and ordering, SimpleQueueProcessingStrategy end-to-end processing, offset commit behavior, and factory integration with thread-queue-parallel mode.
Uptime Consumer Tests `tests/sentry/uptime/consumers/test_results_consumer.py`	Extensive tests for thread-queue-parallel processing including order preservation, concurrent multi-subscription handling, offset commit verification, error handling, offset gaps, graceful shutdown, and Kafka-backed integration tests with AdminClient and offset verification.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant SubmitHandler as submit()
    participant Decoder as decoder
    participant GroupingFn as grouping_fn
    participant QueuePool as FixedQueuePool
    participant WorkQueue as WorkQueue<br/>(per group)
    participant Worker as OrderedQueueWorker
    participant Processor as result_processor
    participant Tracker as OffsetTracker
    participant CommitLoop as _commit_loop
    participant Commit as commit_function

    Client->>SubmitHandler: submit(message)
    SubmitHandler->>Decoder: decode(payload)
    Decoder-->>SubmitHandler: result
    SubmitHandler->>GroupingFn: get_group_key(result)
    GroupingFn-->>SubmitHandler: group_key
    SubmitHandler->>Tracker: add_offset(partition, offset)
    SubmitHandler->>QueuePool: submit(group_key, WorkItem)
    QueuePool->>QueuePool: get_queue_for_group(group_key)
    QueuePool->>WorkQueue: put(WorkItem)

    Worker->>WorkQueue: get(WorkItem)
    Worker->>Processor: process(group_key, result)
    Processor-->>Worker: ✓
    Worker->>Tracker: complete_offset(partition, offset)

    CommitLoop->>CommitLoop: periodic timer
    CommitLoop->>Tracker: get_committable_offsets()
    Tracker-->>CommitLoop: dict[partition→offset]
    CommitLoop->>Commit: offsets_dict
    Commit->>Tracker: mark_committed(partition, offset)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 Queues and workers, threads aligned,
Offsets tracked with ordered mind,
Group keys hash to queues so fair,
Parallel hops through the air,
Committing safe what once took care! 🚀

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title does not clearly describe the changeset's main contribution: implementing a thread-queue-parallel processing framework with comprehensive tests.	Use a more descriptive title that reflects the primary change, such as 'Add thread-queue-parallel processing strategy for Kafka consumers' or 'Implement FixedQueuePool and SimpleQueueProcessingStrategy for in-order message processing'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch kafka-consumer-parallel-after

_{Comment @coderabbit eval help to get the list of available commands and usage tips.}

coderabbit-eval

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/sentry/remote_subscriptions/consumers/queue_consumer.py`:
- Around line 67-98: The get_committable_offsets implementation currently uses
range(start, max_offset+1) which can iterate over many absent offsets; replace
that range-based scan with an iteration over only the present offsets (sort the
partition's all_offsets and filter >= start) and detect contiguity by tracking
the next expected offset. Specifically, in get_committable_offsets, for each
partition use sorted_offsets = sorted(o for o in all_offsets if o >= start), set
expected = start, then loop sorted_offsets: if offset == expected and offset not
in outstanding update highest_committable and expected += 1 else break; keep
using last_committed, outstanding, all_offsets and highest_committable as
before.

🧹 Nitpick comments (3)

tests/sentry/uptime/consumers/test_results_consumer.py (2)
2017-2017: Consider using ClassVar annotation for class-level mutable attribute.

Static analysis suggests annotating pytestmark with ClassVar for clarity.
+from typing import ClassVar
+
 class ProcessResultThreadQueueParallelKafkaTest(UptimeTestCase):
-    pytestmark = [requires_kafka]
+    pytestmark: ClassVar = [requires_kafka]
2142-2146: Log the exception during cleanup for better debugging.

Silently catching exceptions during cleanup can mask issues during test debugging. Consider logging the exception.
🔧 Proposed fix
         finally:
             try:
                 admin_client.delete_topics([test_topic])
-            except Exception:
-                pass
+            except Exception as e:
+                logger.warning("Failed to delete test topic %s: %s", test_topic, e)
src/sentry/remote_subscriptions/consumers/queue_consumer.py (1)
344-345: The timeout parameter in join() is ignored.

The join method ignores the timeout parameter and just calls close(). Looking at other processing strategies in the codebase (e.g., SpanFlusher.join), they typically honor the timeout parameter for graceful shutdown coordination.

Consider passing the timeout through to the underlying operations:
🔧 Proposed fix
     def join(self, timeout: float | None = None) -> None:
-        self.close()
+        self.shutdown_event.set()
+        if timeout is not None:
+            self.commit_thread.join(timeout=timeout / 2)
+            # Use remaining time for queue pool
+            self.queue_pool.wait_until_empty(timeout=timeout / 2)
+        else:
+            self.commit_thread.join(timeout=5.0)
+        self.queue_pool.shutdown()

coderabbit-eval · 2026-02-04T19:00:18Z

src/sentry/remote_subscriptions/consumers/queue_consumer.py

+    def get_committable_offsets(self) -> dict[Partition, int]:
+        """
+        Get the highest offset per partition that can be safely committed.
+
+        For each partition, finds the highest contiguous offset that has been processed.
+        """
+        committable = {}
+        for partition in list(self.all_offsets.keys()):
+            with self._get_partition_lock(partition):
+                all_offsets = self.all_offsets[partition]
+                if not all_offsets:
+                    continue
+
+                outstanding = self.outstanding[partition]
+                last_committed = self.last_committed.get(partition, -1)
+
+                min_offset = min(all_offsets)
+                max_offset = max(all_offsets)
+
+                start = max(last_committed + 1, min_offset)
+
+                highest_committable = last_committed
+                for offset in range(start, max_offset + 1):
+                    if offset in all_offsets and offset not in outstanding:
+                        highest_committable = offset
+                    else:
+                        break
+
+                if highest_committable > last_committed:
+                    committable[partition] = highest_committable
+
+        return committable


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Kafka partition offsets contiguous gaps sparsity

💡 Result:

Kafka offsets are per-partition, monotonically increasing log positions, but they are not guaranteed to be “dense” (i.e., you should not assume every integer offset corresponds to an application-visible record).

Why you can see gaps / sparsity in a partition’s offsets:

Transactional control records (“commit/abort markers”) consume offsets but are not delivered to applications. Kafka writes control records into the log, and consumers use them to handle transactions; they “take up” offsets even though your app never sees them. [1]

Aborted transactional records can be skipped (especially with read_committed). With READ_COMMITTED, clients are given enough information (e.g., aborted-transaction metadata; LSO/last stable offset concepts) to discard ABORTED transactional data, which can make offsets appear to jump. [2]

Log compaction removes records but does not renumber offsets. After compaction, “missing” offsets remain valid positions; a fetch starting at a compacted-away offset returns the next higher offset that still exists. This creates long-lived holes in older parts of the log. [3]

At-least-once / retries / duplicates don’t create gaps, but they break “offset == message count” assumptions. Extra physical records (including duplicates and transaction markers) mean offsets can advance faster than your perceived logical stream. [4]

Practical takeaway: treat offsets as opaque positions used for ordering and checkpointing, not as a contiguous sequence of “real messages.” The only safe assumption is ordering within a partition and uniqueness of offsets within that partition, not density. [3][4]

Sources:
[1] https://kafka.apache.org/41/implementation/message-format/
[2] https://kafka.apache.org/25/design/protocol/
[3] https://docs.confluent.io/kafka/design/log_compaction.html
[4] https://stackoverflow.com/questions/54636524/kafka-streams-does-not-increment-offset-by-1-when-producing-to-topic

Performance concern with sparse offset iteration.

The get_committable_offsets method iterates through range(start, max_offset + 1) which could be expensive if offsets are very sparse. Kafka offsets are not guaranteed to be contiguous — gaps can occur due to transactional control records, aborted transactions, log compaction, and other factors. While the loop breaks on the first gap (limiting the iteration to the first contiguous block), offsets with significant sparse regions could still impact performance. If this becomes a bottleneck, consider tracking offsets in a sorted data structure to avoid iterating through absent offsets.

🤖 Prompt for AI Agents

In `@src/sentry/remote_subscriptions/consumers/queue_consumer.py` around lines 67 - 98, The get_committable_offsets implementation currently uses range(start, max_offset+1) which can iterate over many absent offsets; replace that range-based scan with an iteration over only the present offsets (sort the partition's all_offsets and filter >= start) and detect contiguity by tracking the next expected offset. Specifically, in get_committable_offsets, for each partition use sorted_offsets = sorted(o for o in all_offsets if o >= start), set expected = start, then loop sorted_offsets: if offset == expected and offset not in outstanding update highest_committable and expected += 1 else break; keep using last_committed, outstanding, all_offsets and highest_committable as before.

coderabbit-eval bot closed this Feb 4, 2026

coderabbit-eval bot reviewed Feb 4, 2026

View reviewed changes

Automated Test: kafka-consumer-parallel-after #319

Automated Test: kafka-consumer-parallel-after #319

admin-coderabbit commented Feb 4, 2026 •

edited by coderabbit-eval bot

Loading

coderabbit-eval bot commented Feb 4, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

coderabbit-eval bot left a comment

coderabbit-eval bot Feb 4, 2026

Automated Test: kafka-consumer-parallel-after #319

Automated Test: kafka-consumer-parallel-after #319

Conversation

admin-coderabbit commented Feb 4, 2026 • edited by coderabbit-eval bot Loading

Summary by CodeRabbit

Release Notes

coderabbit-eval bot commented Feb 4, 2026 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

coderabbit-eval bot left a comment

Choose a reason for hiding this comment

coderabbit-eval bot Feb 4, 2026

Choose a reason for hiding this comment

admin-coderabbit commented Feb 4, 2026 •

edited by coderabbit-eval bot

Loading

coderabbit-eval bot commented Feb 4, 2026 •

edited

Loading